Overview

Dataset statistics

Number of variables22
Number of observations7109
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory3.8 MiB
Average record size in memory558.7 B

Variable types

NUM11
CAT8
DATE2
BOOL1

Reproduction

Analysis started2022-09-12 09:45:18.235497
Analysis finished2022-09-12 09:45:56.889383
Duration38.65 seconds
Versionpandas-profiling v2.7.1
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml
N_ROOM is highly correlated with INT_SQFTHigh correlation
INT_SQFT is highly correlated with N_ROOMHigh correlation

Variables

AREA
Categorical

Distinct count7
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size55.7 KiB
Chrompet
1702
Karapakkam
1366
KK Nagar
997
Velachery
981
Anna Nagar
788
Other values (2)
1275
ValueCountFrequency (%) 
Chrompet 1702 23.9%
 
Karapakkam 1366 19.2%
 
KK Nagar 997 14.0%
 
Velachery 981 13.8%
 
Anna Nagar 788 11.1%
 
Adyar 774 10.9%
 
T Nagar 501 7.0%
 

Length

Max length10
Mean length8.346884231
Min length5
ValueCountFrequency (%) 
Lowercase_Letter 15 68.2%
 
Uppercase_Letter 6 27.3%
 
Space_Separator 1 4.5%
 
ValueCountFrequency (%) 
Latin 21 95.5%
 
Common 1 4.5%
 
ValueCountFrequency (%) 
ASCII 22 100.0%
 

INT_SQFT
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count1699
Unique (%)23.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1382.0730060486708
Minimum500
Maximum2500
Zeros0
Zeros (%)0.0%
Memory size55.7 KiB

Quantile statistics

Minimum500
5-th percentile702
Q1993
median1373
Q31744
95-th percentile2084.6
Maximum2500
Range2000
Interquartile range (IQR)751

Descriptive statistics

Standard deviation457.4109025
Coefficient of variation (CV)0.3309600147
Kurtosis-0.8863792596
Mean1382.073006
Median Absolute Deviation (MAD)376
Skewness0.1312376308
Sum9825157
Variance209224.7337
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1781 18 0.3%
 
1538 15 0.2%
 
1514 13 0.2%
 
1505 13 0.2%
 
786 12 0.2%
 
961 12 0.2%
 
1081 12 0.2%
 
1634 12 0.2%
 
1655 12 0.2%
 
1622 11 0.2%
 
Other values (1689) 6979 98.2%
 
ValueCountFrequency (%) 
500 3 < 0.1%
 
501 2 < 0.1%
 
502 1 < 0.1%
 
504 2 < 0.1%
 
505 1 < 0.1%
 
ValueCountFrequency (%) 
2500 1 < 0.1%
 
2499 1 < 0.1%
 
2498 1 < 0.1%
 
2497 1 < 0.1%
 
2496 3 < 0.1%
 
Distinct count2798
Unique (%)39.4%
Missing0
Missing (%)0.0%
Memory size55.7 KiB
Minimum2004-01-02 00:00:00
Maximum2015-12-02 00:00:00
Histogram

DIST_MAINROAD
Real number (ℝ≥0)

Distinct count201
Unique (%)2.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean99.60317906878605
Minimum0
Maximum200
Zeros33
Zeros (%)0.5%
Memory size55.7 KiB

Quantile statistics

Minimum0
5-th percentile10
Q150
median99
Q3148
95-th percentile190
Maximum200
Range200
Interquartile range (IQR)98

Descriptive statistics

Standard deviation57.40310959
Coefficient of variation (CV)0.5763180465
Kurtosis-1.165240378
Mean99.60317907
Median Absolute Deviation (MAD)49
Skewness0.01814383556
Sum708079
Variance3295.11699
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
39 56 0.8%
 
51 53 0.7%
 
78 52 0.7%
 
77 49 0.7%
 
14 48 0.7%
 
156 48 0.7%
 
73 48 0.7%
 
49 47 0.7%
 
111 47 0.7%
 
190 46 0.6%
 
Other values (191) 6615 93.1%
 
ValueCountFrequency (%) 
0 33 0.5%
 
1 28 0.4%
 
2 44 0.6%
 
3 27 0.4%
 
4 46 0.6%
 
ValueCountFrequency (%) 
200 38 0.5%
 
199 30 0.4%
 
198 30 0.4%
 
197 38 0.5%
 
196 36 0.5%
 

N_BEDROOM
Categorical

Distinct count4
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size55.7 KiB
1
3796
2
2352
3
 
707
4
 
254
ValueCountFrequency (%) 
1 3796 53.4%
 
2 2352 33.1%
 
3 707 9.9%
 
4 254 3.6%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 4 100.0%
 
ValueCountFrequency (%) 
Common 4 100.0%
 
ValueCountFrequency (%) 
ASCII 4 100.0%
 

N_BATHROOM
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size55.7 KiB
1
5594
2
1515
ValueCountFrequency (%) 
1 5594 78.7%
 
2 1515 21.3%
 

Length

Max length1
Mean length1
Min length1
ValueCountFrequency (%) 
Decimal_Number 2 100.0%
 
ValueCountFrequency (%) 
Common 2 100.0%
 
ValueCountFrequency (%) 
ASCII 2 100.0%
 

N_ROOM
Real number (ℝ≥0)

HIGH CORRELATION
Distinct count5
Unique (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.6887044591363063
Minimum2
Maximum6
Zeros0
Zeros (%)0.0%
Memory size55.7 KiB

Quantile statistics

Minimum2
5-th percentile2
Q13
median4
Q34
95-th percentile5
Maximum6
Range4
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.019098916
Coefficient of variation (CV)0.2762755671
Kurtosis-0.5307863127
Mean3.688704459
Median Absolute Deviation (MAD)1
Skewness0.1188007656
Sum26223
Variance1.038562601
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
4 2563 36.1%
 
3 2125 29.9%
 
5 1246 17.5%
 
2 921 13.0%
 
6 254 3.6%
 
ValueCountFrequency (%) 
2 921 13.0%
 
3 2125 29.9%
 
4 2563 36.1%
 
5 1246 17.5%
 
6 254 3.6%
 
ValueCountFrequency (%) 
6 254 3.6%
 
5 1246 17.5%
 
4 2563 36.1%
 
3 2125 29.9%
 
2 921 13.0%
 

SALE_COND
Categorical

Distinct count5
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size55.7 KiB
adj land
1439
partial
1433
normal sale
1423
abnormal
1411
family
1403
ValueCountFrequency (%) 
adj land 1439 20.2%
 
partial 1433 20.2%
 
normal sale 1423 20.0%
 
abnormal 1411 19.8%
 
family 1403 19.7%
 

Length

Max length11
Mean length8.004220003
Min length6
ValueCountFrequency (%) 
Lowercase_Letter 16 94.1%
 
Space_Separator 1 5.9%
 
ValueCountFrequency (%) 
Latin 16 94.1%
 
Common 1 5.9%
 
ValueCountFrequency (%) 
ASCII 17 100.0%
 

PARK_FACIL
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size55.7 KiB
Yes
3587
No
3522
ValueCountFrequency (%) 
Yes 3587 50.5%
 
No 3522 49.5%
 
Distinct count5808
Unique (%)81.7%
Missing0
Missing (%)0.0%
Memory size55.7 KiB
Minimum1949-10-28 00:00:00
Maximum2010-12-11 00:00:00
Histogram

BUILDTYPE
Categorical

Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size55.7 KiB
House
2444
Others
2336
Commercial
2329
ValueCountFrequency (%) 
House 2444 34.4%
 
Others 2336 32.9%
 
Commercial 2329 32.8%
 

Length

Max length10
Mean length6.966661978
Min length5
ValueCountFrequency (%) 
Lowercase_Letter 12 80.0%
 
Uppercase_Letter 3 20.0%
 
ValueCountFrequency (%) 
Latin 15 100.0%
 
ValueCountFrequency (%) 
ASCII 15 100.0%
 

UTILITY_AVAIL
Categorical

Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size55.7 KiB
NoSewr
3700
All Pub
1887
ELO
1522
ValueCountFrequency (%) 
NoSewr 3700 52.0%
 
All Pub 1887 26.5%
 
ELO 1522 21.4%
 

Length

Max length7
Mean length5.623153749
Min length3
ValueCountFrequency (%) 
Lowercase_Letter 7 46.7%
 
Uppercase_Letter 7 46.7%
 
Space_Separator 1 6.7%
 
ValueCountFrequency (%) 
Latin 14 93.3%
 
Common 1 6.7%
 
ValueCountFrequency (%) 
ASCII 15 100.0%
 

STREET
Categorical

Distinct count3
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size55.7 KiB
Paved
2572
Gravel
2520
No Access
2017
ValueCountFrequency (%) 
Paved 2572 36.2%
 
Gravel 2520 35.4%
 
No Access 2017 28.4%
 

Length

Max length9
Mean length6.48937966
Min length5
ValueCountFrequency (%) 
Lowercase_Letter 9 64.3%
 
Uppercase_Letter 4 28.6%
 
Space_Separator 1 7.1%
 
ValueCountFrequency (%) 
Latin 13 92.9%
 
Common 1 7.1%
 
ValueCountFrequency (%) 
ASCII 14 100.0%
 

MZZONE
Categorical

Distinct count6
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size55.7 KiB
RL
1858
RH
1822
RM
1817
C
550
A
537
ValueCountFrequency (%) 
RL 1858 26.1%
 
RH 1822 25.6%
 
RM 1817 25.6%
 
C 550 7.7%
 
A 537 7.6%
 
I 525 7.4%
 

Length

Max length2
Mean length1.773245182
Min length1
ValueCountFrequency (%) 
Uppercase_Letter 7 100.0%
 
ValueCountFrequency (%) 
Latin 7 100.0%
 
ValueCountFrequency (%) 
ASCII 7 100.0%
 

QS_ROOMS
Real number (ℝ≥0)

Distinct count31
Unique (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.5174708116472075
Minimum2.0
Maximum5.0
Zeros0
Zeros (%)0.0%
Memory size55.7 KiB

Quantile statistics

Minimum2
5-th percentile2.1
Q12.7
median3.5
Q34.3
95-th percentile4.9
Maximum5
Range3
Interquartile range (IQR)1.6

Descriptive statistics

Standard deviation0.8919724311
Coefficient of variation (CV)0.2535834635
Kurtosis-1.197535123
Mean3.517470812
Median Absolute Deviation (MAD)0.8
Skewness-0.01895704371
Sum25005.7
Variance0.7956148178
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2.5 265 3.7%
 
3.8 259 3.6%
 
3.6 255 3.6%
 
4.6 252 3.5%
 
3.9 245 3.4%
 
4.9 242 3.4%
 
3.4 240 3.4%
 
4.8 239 3.4%
 
4.2 239 3.4%
 
3.3 239 3.4%
 
Other values (21) 4634 65.2%
 
ValueCountFrequency (%) 
2 203 2.9%
 
2.1 236 3.3%
 
2.2 213 3.0%
 
2.3 224 3.2%
 
2.4 208 2.9%
 
ValueCountFrequency (%) 
5 228 3.2%
 
4.9 242 3.4%
 
4.8 239 3.4%
 
4.7 239 3.4%
 
4.6 252 3.5%
 

QS_BATHROOM
Real number (ℝ≥0)

Distinct count31
Unique (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.507244338162892
Minimum2.0
Maximum5.0
Zeros0
Zeros (%)0.0%
Memory size55.7 KiB

Quantile statistics

Minimum2
5-th percentile2.1
Q12.7
median3.5
Q34.3
95-th percentile4.9
Maximum5
Range3
Interquartile range (IQR)1.6

Descriptive statistics

Standard deviation0.8978337054
Coefficient of variation (CV)0.2559940565
Kurtosis-1.21625135
Mean3.507244338
Median Absolute Deviation (MAD)0.8
Skewness0.0003104318578
Sum24933
Variance0.8061053625
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2.7 256 3.6%
 
4.8 255 3.6%
 
3.7 251 3.5%
 
4.7 247 3.5%
 
4.9 245 3.4%
 
3 241 3.4%
 
4.2 237 3.3%
 
3.4 234 3.3%
 
2.2 234 3.3%
 
4.6 234 3.3%
 
Other values (21) 4675 65.8%
 
ValueCountFrequency (%) 
2 222 3.1%
 
2.1 224 3.2%
 
2.2 234 3.3%
 
2.3 220 3.1%
 
2.4 230 3.2%
 
ValueCountFrequency (%) 
5 219 3.1%
 
4.9 245 3.4%
 
4.8 255 3.6%
 
4.7 247 3.5%
 
4.6 234 3.3%
 

QS_BEDROOM
Real number (ℝ≥0)

Distinct count31
Unique (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.485300323533549
Minimum2.0
Maximum5.0
Zeros0
Zeros (%)0.0%
Memory size55.7 KiB

Quantile statistics

Minimum2
5-th percentile2.1
Q12.7
median3.5
Q34.3
95-th percentile4.9
Maximum5
Range3
Interquartile range (IQR)1.6

Descriptive statistics

Standard deviation0.8872664105
Coefficient of variation (CV)0.2545738755
Kurtosis-1.190165265
Mean3.485300324
Median Absolute Deviation (MAD)0.8
Skewness0.01728160906
Sum24777
Variance0.7872416831
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2.6 273 3.8%
 
3.2 253 3.6%
 
4 248 3.5%
 
2.4 244 3.4%
 
3.8 244 3.4%
 
3.1 243 3.4%
 
2.1 242 3.4%
 
3 241 3.4%
 
3.4 239 3.4%
 
4.4 237 3.3%
 
Other values (21) 4645 65.3%
 
ValueCountFrequency (%) 
2 221 3.1%
 
2.1 242 3.4%
 
2.2 237 3.3%
 
2.3 200 2.8%
 
2.4 244 3.4%
 
ValueCountFrequency (%) 
5 217 3.1%
 
4.9 203 2.9%
 
4.8 211 3.0%
 
4.7 228 3.2%
 
4.6 233 3.3%
 

QS_OVERALL
Real number (ℝ≥0)

Distinct count480
Unique (%)6.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.503253788415239
Minimum2.0
Maximum4.97
Zeros0
Zeros (%)0.0%
Memory size55.7 KiB

Quantile statistics

Minimum2
5-th percentile2.63
Q13.13
median3.503253788
Q33.88
95-th percentile4.37
Maximum4.97
Range2.97
Interquartile range (IQR)0.75

Descriptive statistics

Standard deviation0.5254397319
Coefficient of variation (CV)0.1499862024
Kurtosis-0.4725985847
Mean3.503253788
Median Absolute Deviation (MAD)0.3767462116
Skewness-0.007287861446
Sum24904.63118
Variance0.2760869118
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3.54 59 0.8%
 
3.26 57 0.8%
 
3.32 56 0.8%
 
3.56 55 0.8%
 
3.36 54 0.8%
 
3.34 53 0.7%
 
3.47 51 0.7%
 
3.2 51 0.7%
 
3.96 51 0.7%
 
3.74 50 0.7%
 
Other values (470) 6572 92.4%
 
ValueCountFrequency (%) 
2 1 < 0.1%
 
2.06 2 < 0.1%
 
2.09 1 < 0.1%
 
2.11 1 < 0.1%
 
2.18 3 < 0.1%
 
ValueCountFrequency (%) 
4.97 1 < 0.1%
 
4.95 1 < 0.1%
 
4.94 1 < 0.1%
 
4.93 1 < 0.1%
 
4.9 1 < 0.1%
 

REG_FEE
Real number (ℝ≥0)

Distinct count7038
Unique (%)99.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean376938.33070755383
Minimum71177
Maximum983922
Zeros0
Zeros (%)0.0%
Memory size55.7 KiB

Quantile statistics

Minimum71177
5-th percentile197984.6
Q1272406
median349486
Q3451562
95-th percentile669167.4
Maximum983922
Range912745
Interquartile range (IQR)179156

Descriptive statistics

Standard deviation143070.662
Coefficient of variation (CV)0.3795598652
Kurtosis1.126499412
Mean376938.3307
Median Absolute Deviation (MAD)85998
Skewness1.037754561
Sum2679654593
Variance2.046921433e+10
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
235229 3 < 0.1%
 
348034 2 < 0.1%
 
353677 2 < 0.1%
 
441717 2 < 0.1%
 
330086 2 < 0.1%
 
518512 2 < 0.1%
 
222526 2 < 0.1%
 
424361 2 < 0.1%
 
257917 2 < 0.1%
 
264914 2 < 0.1%
 
Other values (7028) 7088 99.7%
 
ValueCountFrequency (%) 
71177 1 < 0.1%
 
95798 1 < 0.1%
 
103928 1 < 0.1%
 
106466 1 < 0.1%
 
111366 1 < 0.1%
 
ValueCountFrequency (%) 
983922 1 < 0.1%
 
981117 1 < 0.1%
 
963029 1 < 0.1%
 
952411 1 < 0.1%
 
947124 1 < 0.1%
 

COMMIS
Real number (ℝ≥0)

Distinct count7011
Unique (%)98.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean141005.7265438177
Minimum5055
Maximum495405
Zeros0
Zeros (%)0.0%
Memory size55.7 KiB

Quantile statistics

Minimum5055
5-th percentile35990.6
Q184219
median127628
Q3184506
95-th percentile292538
Maximum495405
Range490350
Interquartile range (IQR)100287

Descriptive statistics

Standard deviation78768.09372
Coefficient of variation (CV)0.558616275
Kurtosis1.073363345
Mean141005.7265
Median Absolute Deviation (MAD)49095
Skewness0.9516562165
Sum1002409710
Variance6204412588
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
117825 3 < 0.1%
 
231426 2 < 0.1%
 
95120 2 < 0.1%
 
75962 2 < 0.1%
 
145973 2 < 0.1%
 
92784 2 < 0.1%
 
48067 2 < 0.1%
 
185864 2 < 0.1%
 
97572 2 < 0.1%
 
286822 2 < 0.1%
 
Other values (7001) 7088 99.7%
 
ValueCountFrequency (%) 
5055 1 < 0.1%
 
5126 1 < 0.1%
 
5378 1 < 0.1%
 
5620 1 < 0.1%
 
5943 1 < 0.1%
 
ValueCountFrequency (%) 
495405 1 < 0.1%
 
491961 1 < 0.1%
 
485924 1 < 0.1%
 
481001 1 < 0.1%
 
479297 1 < 0.1%
 

SALES_PRICE
Real number (ℝ≥0)

Distinct count7057
Unique (%)99.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10894909.63918976
Minimum2156875
Maximum23667340
Zeros0
Zeros (%)0.0%
Memory size55.7 KiB

Quantile statistics

Minimum2156875
5-th percentile5630100
Q18272100
median10335050
Q312993900
95-th percentile18790428
Maximum23667340
Range21510465
Interquartile range (IQR)4721800

Descriptive statistics

Standard deviation3768603.457
Coefficient of variation (CV)0.345904976
Kurtosis0.5881293416
Mean10894909.64
Median Absolute Deviation (MAD)2317605
Skewness0.7733433359
Sum7.745191262e+10
Variance1.420237202e+13
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
9817500 2 < 0.1%
 
7195550 2 < 0.1%
 
7629750 2 < 0.1%
 
8033250 2 < 0.1%
 
6519000 2 < 0.1%
 
9213320 2 < 0.1%
 
7855000 2 < 0.1%
 
8191250 2 < 0.1%
 
11930880 2 < 0.1%
 
9429000 2 < 0.1%
 
Other values (7047) 7089 99.7%
 
ValueCountFrequency (%) 
2156875 1 < 0.1%
 
2476375 1 < 0.1%
 
2640250 1 < 0.1%
 
2797250 1 < 0.1%
 
2939750 1 < 0.1%
 
ValueCountFrequency (%) 
23667340 1 < 0.1%
 
23407860 1 < 0.1%
 
23314580 1 < 0.1%
 
23307000 1 < 0.1%
 
23247590 1 < 0.1%
 

HOUSE_AGE
Real number (ℝ≥0)

Distinct count1652
Unique (%)23.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8867.524265016176
Minimum1430.0
Maximum20368.0
Zeros0
Zeros (%)0.0%
Memory size55.7 KiB

Quantile statistics

Minimum1430
5-th percentile2190
Q15110
median8583
Q312410
95-th percentile16513
Maximum20368
Range18938
Interquartile range (IQR)7300

Descriptive statistics

Standard deviation4506.780646
Coefficient of variation (CV)0.5082343743
Kurtosis-0.8414678596
Mean8867.524265
Median Absolute Deviation (MAD)3590
Skewness0.2663986739
Sum63039230
Variance20311071.79
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2555 116 1.6%
 
1825 112 1.6%
 
7300 104 1.5%
 
6205 99 1.4%
 
5110 98 1.4%
 
2920 98 1.4%
 
4380 97 1.4%
 
6935 96 1.4%
 
5840 96 1.4%
 
5475 95 1.3%
 
Other values (1642) 6098 85.8%
 
ValueCountFrequency (%) 
1430 11 0.2%
 
1431 9 0.1%
 
1432 1 < 0.1%
 
1433 4 0.1%
 
1460 25 0.4%
 
ValueCountFrequency (%) 
20368 1 < 0.1%
 
20282 2 < 0.1%
 
20250 1 < 0.1%
 
20105 1 < 0.1%
 
20075 6 0.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

AREAINT_SQFTDATE_SALEDIST_MAINROADN_BEDROOMN_BATHROOMN_ROOMSALE_CONDPARK_FACILDATE_BUILDBUILDTYPEUTILITY_AVAILSTREETMZZONEQS_ROOMSQS_BATHROOMQS_BEDROOMQS_OVERALLREG_FEECOMMISSALES_PRICEHOUSE_AGE
0Karapakkam10042011-04-05131113abnormalYes1967-05-15CommercialAll PubPavedA4.03.94.94.330380000144400760000016031.0
1Anna Nagar19862006-12-1926215abnormalNo1995-12-22CommercialAll PubGravelRH4.94.22.53.765760122304049217177704015.0
2Adyar9092012-04-0270113abnormalYes1992-09-02CommercialELOGravelRL4.13.82.23.09042109492114131592007152.0
3Velachery18552010-03-1314325familyNo1988-03-18OthersNoSewrPavedI4.73.93.64.0103563217704296302908030.0
4Karapakkam12262009-05-1084113abnormalYes1979-10-13OthersAll PubGravelC3.02.54.13.29023700074063740625010802.0
5Chrompet12202014-11-0936214partialNo2009-12-09CommercialNoSewrNo AccessRH4.52.63.13.320409027198316123947501796.0
6Chrompet11672007-05-04137113partialNo1979-12-04OthersAll PubNo AccessRL3.62.12.52.67026315233955848879010013.0
7Velachery18472006-03-13176325familyNo1996-03-15CommercialAll PubGravelRM2.44.52.13.260604809235204168002503650.0
8Chrompet7712011-06-04175112adj landNo1977-04-14OthersNoSewrPavedRM2.93.74.03.55025757833236830897012469.0
9Velachery16352006-06-2274214abnormalNo1991-06-26OthersELONo AccessI3.13.13.33.16032334612125580836505475.0

Last rows

AREAINT_SQFTDATE_SALEDIST_MAINROADN_BEDROOMN_BATHROOMN_ROOMSALE_CONDPARK_FACILDATE_BUILDBUILDTYPEUTILITY_AVAILSTREETMZZONEQS_ROOMSQS_BATHROOMQS_BEDROOMQS_OVERALLREG_FEECOMMISSALES_PRICEHOUSE_AGE
7099Adyar8952011-05-01197113adj landYes1971-01-15HouseNoSewrNo AccessI3.64.74.24.122506417372737180014716.0
7100T Nagar17332010-02-24191114abnormalYes1985-02-03CommercialNoSewrNo AccessRL3.43.72.12.89702058312026195016009152.0
7101Karapakkam6662010-11-0551112adj landYes1974-05-20OthersELOGravelI3.24.42.53.2827331774541621175013318.0
7102Karapakkam7012010-03-02100112abnormalNo1990-08-02HouseNoSewrGravelRH4.23.02.02.9628217514108856435007152.0
7103Karapakkam14622010-04-2368224familyNo1986-04-29OthersNoSewrGravelRM2.73.33.63.2435671617835893872508760.0
7104Karapakkam5982011-03-0151112adj landNo1962-01-15OthersELONo AccessRM3.02.22.42.52208767107060535300017942.0
7105Velachery18972004-08-0452325familyYes1995-11-04OthersNoSewrNo AccessRH3.64.53.33.92346191205551108184803196.0
7106Velachery16142006-08-25152214normal saleNo1978-01-09HouseNoSewrGravelI4.34.22.93.84317354167028835141010455.0
7107Karapakkam7872009-03-0840112partialYes1977-11-08CommercialELOPavedRL4.63.84.14.16425350119098850700011443.0
7108Velachery18962005-07-13156325partialYes1961-07-24OthersELOPavedI3.13.54.33.6434917779812997648016060.0